Partial Integers

Learn about digital search trees and their operations.

Bits as part of integers#

In this module, we return to the problem of implementing an SSet. The difference now is that we assume the elements stored in the SSet are ww-bit integers. That is, we want to implement add(x), remove(x), and find(x) where x{0,...,2w1}x \in \{0,...,2^w−1\}. It is not too hard to think of plenty of applications where the data—or at least the key that we use for sorting the data—is an integer.

We discuss three data structures, each building on the ideas of the previous. The first structure, the BinaryTrie performs all three SSet operations in O(w)O(w) time. This is not very impressive, since any subset of {0,...,2w1}\{0,...,2^w − 1\} has size n2wn \leq 2^w , so that log nwlog\space n \leq w. The other SSet implementations perform all operations in O(logn)O(\log n) time so they are all at least as fast as a BinaryTrie.

The second structure, the XFastTrie, speeds up the search in a BinaryTrie by using hashing. With this speedup, the find(x) operation runs in O(logw)O(\log w) time. However, add(x) and remove(x) operations in an XFastTrie still take O(w)O(w) time and the space used by an XFastTrie is O(nw)O(n \cdot w).

The third data structure, the YFastTrie, uses an XFastTrie to store only a sample of roughly one out of every w elements and stores the remaining elements in a standard SSet structure. This trick reduces the running time of add(x) and remove(x) to O(logw)O(\log w) and decreases the space to O(n)O(n).

The implementations used as examples can store any type of data, as long as an integer can be associated with it. In the code samples, the variable ix is always the integer value associated with x, and the method in.intValue(x) converts x to its associated integer. In the text, however, we will simply treat x as if it is an integer.

The integers stored in a binary trie are encoded as root-to-leaf paths
The integers stored in a binary trie are encoded as root-to-leaf paths

BinaryTrie: A digital search tree#

A BinaryTrie encodes a set of ww bit integers in a binary tree. All leaves in the tree have depth ww and each integer is encoded as a root-to-leaf path. The path for the integer x turns left at level i if the iith most significant bit of x is a 00 and turns right if it is a 11. The above illustration shows an example for the case w=4w = 4, in which the trie stores the integers 3(0011),9(1001),12(1100),3(0011), 9(1001), 12(1100), and 13(1101)13(1101).

Because the search path for a value x depends on the bits of x, it will be helpful to name the children of a node, u, u.child[0] (left) and u.child[1] (right). These child pointers will actually serve double-duty. Since the leaves in a binary trie have no children, the pointers are used to string the leaves together into a doubly-linked list. For a leaf in the binary trie u.child[0] (prev) is the node that comes before u in the list and u.child[1] (next) is the node that follows u in the list. A special node, dummy, is used both before the first node and after the last node in the list.

A BinaryTrie with jump pointers shown as dashed edges
A BinaryTrie with jump pointers shown as dashed edges

Each node, u, also contains an additional pointer u.jump. If u’s left child is missing, then u.jump points to the smallest leaf in u’s subtree. If u’s right child is missing, then u.jump points to the largest leaf in u’s subtree. An example of a BinaryTrie, showing jump pointers and the doubly-linked list at the leaves, is shown in the illustration.

The find(x) operation in a BinaryTrie is fairly straightforward. We try to follow the search path for x in the trie. If we reach a leaf, then we have found x. If we reach a node u where we cannot proceed (because u is missing a child), then we follow u.jump, which takes us either to the smallest leaf larger than x or the largest leaf smaller than x. Which of these two cases occurs depends on whether u is missing its left or right child, respectively. In the former case (u is missing its left child), we have found the node we want. In the latter case (u is missing its right child), we can use the linked list to reach the node we want.

Visual demonstration#

Each of these cases are illustrated in the illustration below.

Created with Fabric.js 3.6.6

1 of 9

Created with Fabric.js 3.6.6

2 of 9

Created with Fabric.js 3.6.6

3 of 9

Created with Fabric.js 3.6.6

4 of 9

Created with Fabric.js 3.6.6

5 of 9

Created with Fabric.js 3.6.6

6 of 9

Created with Fabric.js 3.6.6

7 of 9

Created with Fabric.js 3.6.6

8 of 9

Created with Fabric.js 3.6.6

9 of 9

The implementation of the find() method is:

The find() method

The running time of the find(x) method is dominated by the time it takes to follow a root-to-leaf path, so it runs in O(w)O(w) time.

The add() operation#

The add(x) operation in a BinaryTrie is also fairly straightforward, but has a lot of work to do:

  1. It follows the search path for x until reaching a node u where it can no longer proceed.
  2. It creates the remainder of the search path from u to a leaf that contains x.
  3. It adds the node, uu', containing x to the linked list of leaves (it has access to the predecessor, pred, of uu' in the linked list from the jump pointer of the last node, u, encountered during step 1.)
  4. It walks back up the search path for x adjusting jump pointers at the nodes whose jump pointer should now point to x.

An addition is illustrated below:

Adding the values 2 and 15 to the BinaryTrie
Adding the values 2 and 15 to the BinaryTrie

The implementation of the add() method is:

The add() method

This method performs one walk down the search path for x and one walk back up. Each step of these walks takes constant time, so the add(x) method runs in O(w)O(w) time.

The remove() operation#

The remove(x) operation undoes the work of add(x). Like add(x), it has a lot of work to do:

  1. It follows the search path for x until reaching the leaf, u, containing x.
  2. It removes u from the doubly-linked list.
  3. It deletes u and then walks back up the search path for x deleting nodes until reaching a node v that has a child that is not on the search path for x.
  4. It walks upwards from v to the root updating any jump pointers that point to u.

A removal is illustrated in below illustration.

Removing the value 9 from the BinaryTrie
Removing the value 9 from the BinaryTrie

The implementation of the remove() method is:

The remove() method

Theorem: A BinaryTrie implements the SSet interface for w-bit integers. A BinaryTrie supports the operations add(x), remove(x), and find(x) in O(w)O(w) time per operation. The space used by a BinaryTrie that stores nn values is O(nw)O(n \cdot w).

Let’s now create an instance of a BinaryTrie class and perform easy tests on it.

main.py
base.py
utils.py
Implementation of BinaryTrie

Explanation

Let’s start our explanation from the start of main().

  • Line 195: A new BinaryTrie class instance is created and stored in the variable bt. The BinaryTrie class is designed to hold a collection of integers, using a binary trie data structure.

  • Line 197–202: The integers 5, 3, 8, 7, 2, and 1 are added to the binary trie by calling the add() method on bt. Each time add() is called, a new node is created in the binary trie, and the integer being added is inserted into the trie.

  • Line 207–209: The find() method is then called on the binary trie to search for the values 5, 2, and 10. The find() method searches the binary trie for a node with the specified value and returns True if the value is found and False otherwise.

  • Line 211–213: The integers 5, 7, and 1 are removed from the binary trie by calling the remove method on bt. Each time remove() is called, the node with the specified value is removed from the binary trie.

  • Line 218–220: The find() method is called again to search for the values 5, 7, and 1 in the updated binary trie. Since these values were removed in the previous step, the find() method will return False for each of these values.

Solution: Graphs

Doubly-Logarithmic Time